In this chapter, we conduct an exploratory data analysis (EDA) to investigate physiological responses to stress. We focus on comparing baseline states with stress conditions and analyzing differences across laboratory and real-world contexts.

Show Code
library(tidyverse)
library(ggridges) # For ridge plots
library(viridis)  # For color accessible palettes
library(patchwork) # For combining plots

# 1. Load the cleaned data
wesad <- read_csv("data/clean/wesad_all_subjects.csv")
road  <- read_csv("data/clean/road_hr_all.csv")
swell <- read_csv("data/clean/swell_processed.csv")

# 2. Set a consistent theme for "Audience Ready" style
my_theme <- theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold", size = 16),
    plot.subtitle = element_text(color = "grey30", size = 12),
    legend.position = "top",
    panel.grid.minor = element_blank()
  )

3.1 3.1 Physiological Signatures of Stress (WESAD)

We first analyze the WESAD dataset to establish a “ground truth” for what stress looks like physiologically compared to baseline and amusement states.

3.1.1 EDA: The Primary Stress Indicator

Electrodermal Activity (EDA), also known as skin conductance, is a key indicator of physiological arousal. We expect stress to trigger significantly higher EDA levels.

Show Code
# Filter for key conditions (removing Meditation for clearer contrast)
wesad_main <- wesad %>% 
  filter(Condition %in% c("Baseline", "Stress", "Amusement"))

# Graph 1: EDA Boxplot
ggplot(wesad_main, aes(x = Condition, y = EDA, fill = Condition)) +
  geom_boxplot(outlier.alpha = 0.1, alpha = 0.8) +
  scale_fill_brewer(palette = "RdBu", direction = -1) +
  scale_y_continuous(limits = c(0, 25)) + # Zoom in to ignore extreme outliers for better visibility
  labs(
    title = "Skin Conductance (EDA) by Condition",
    subtitle = "Stress elicits significantly higher EDA than Baseline",
    y = "EDA (μS)",
    x = NULL
  ) +
  my_theme + theme(legend.position = "none")

Comparison of EDA across conditions. Stress shows a higher median and wider interquartile range.

3.1.2 Body Temperature Response

Does skin temperature drop or rise during stress? The “cold hands” phenomenon suggests a drop due to vasoconstriction, but metabolic activity might suggest a rise.

Show Code
# Graph 2: Temperature Violin Plot
ggplot(wesad_main, aes(x = Condition, y = Temp, fill = Condition)) +
  geom_violin(trim = FALSE, alpha = 0.7) +
  stat_summary(fun = median, geom = "point", size = 2, color = "black") +
  scale_fill_brewer(palette = "RdBu", direction = -1) +
  labs(
    title = "Skin Temperature Distribution",
    subtitle = "Temperature varies less dramatically than EDA",
    y = "Temperature (°C)",
    x = NULL
  ) +
  my_theme

Violin plot of Skin Temperature. Note the subtle shifts in distribution.

3.2 3.2 Individual Variations in Heart Rate

Physiological responses are highly personal. A stressor that spikes one person’s heart rate might not affect another’s as severely. Here we use Ridge Plots to visualize the heart rate (ECG) distribution for each subject.

Show Code
# Graph 3: ECG Ridge Plot
ggplot(wesad_main, aes(x = ECG, y = Subject, fill = Condition)) +
  geom_density_ridges(alpha = 0.6, scale = 0.9, rel_min_height = 0.01) +
  scale_fill_manual(values = c("gray70", "#E41A1C", "#377EB8")) +
  labs(
    title = "ECG Signal Distribution by Subject",
    subtitle = "Comparing Baseline (Gray), Stress (Red), and Amusement (Blue)",
    x = "ECG Amplitude (Normalized)",
    y = "Subject ID"
  ) +
  theme_ridges() +
  theme(legend.position = "top")

ECG signal distributions for each subject. S2 and S3 show distinct shifts.

3.3 3.3 Real-World Context: Driving Stress

How does heart rate variability look in a real-world driving task? We analyze the AffectiveROAD dataset to see heart rate patterns during different driving sessions.

Show Code
# Graph 4: Driving HR Bar Chart
# Summarize data by DriveID
road_summary <- road %>%
  group_by(DriveID) %>%
  summarise(
    Mean_HR = mean(HR, na.rm = TRUE),
    SD_HR = sd(HR, na.rm = TRUE) # SD is a proxy for HRV (Heart Rate Variability)
  ) %>%
  arrange(desc(Mean_HR))

ggplot(road_summary, aes(x = reorder(DriveID, Mean_HR), y = Mean_HR)) +
  geom_col(fill = "steelblue", width = 0.7) +
  geom_errorbar(aes(ymin = Mean_HR - SD_HR, ymax = Mean_HR + SD_HR), width = 0.2, color = "grey30") +
  coord_flip() +
  labs(
    title = "Heart Rate during Driving Sessions",
    subtitle = "Error bars represent Standard Deviation (Proxy for HRV)",
    y = "Heart Rate (BPM)",
    x = "Driving Session"
  ) +
  my_theme

Average Heart Rate across different driving sessions.

3.4 3.4 Multi-modal Correlations

Do physiological signals synchronize during stress? We examine the correlation between Heart Rate (ECG), Skin Conductance (EDA), and Body Temperature (Temp) to see if they move in tandem.

Show Code
# 1. Select numeric physiological columns
cor_data <- wesad_main %>%
  select(EDA, Temp, ECG)

# 2. Compute correlation matrix
cor_matrix <- cor(cor_data, use = "complete.obs")

# 3. Transform to long format for ggplot
cor_melted <- cor_matrix %>%
  as.data.frame() %>%
  rownames_to_column(var = "Var1") %>%
  pivot_longer(cols = -Var1, names_to = "Var2", values_to = "Correlation")

# 4. Plot Heatmap
ggplot(cor_melted, aes(x = Var1, y = Var2, fill = Correlation)) +
  geom_tile(color = "white") +
  geom_text(aes(label = round(Correlation, 2)), color = "black", size = 4) +
  scale_fill_gradient2(low = "#377EB8", mid = "white", high = "#E41A1C", 
                       midpoint = 0, limit = c(-1, 1)) +
  labs(
    title = "Correlation of Physiological Signals",
    subtitle = "EDA and Temperature show distinct relationship patterns compared to ECG",
    x = "", y = ""
  ) +
  theme_minimal() +
  coord_fixed()

Correlation Matrix of Physiological Signals. Red indicates positive correlation, Blue indicates negative.

3.5 3.5 Temporal Dynamics: Stress & Recovery Rhythms

A key goal of this project is to explore the daily rhythms of stress. Here, we zoom in on a single subject (S2) to visualize how their physiological state evolves over time, transitioning from Baseline to Stress and finally to Amusement (Recovery).

Show Code
# 1. Filter for Subject S2 and create a Time axis
# Since we downsampled to ~10Hz, each row is approx 0.1 seconds.
subject_s2 <- wesad %>%
  filter(Subject == "S2", Condition %in% c("Baseline", "Stress", "Amusement")) %>%
  mutate(Time_Sec = row_number() / 10) # Create time in seconds

# 2. Visualization
# We want to highlight the "Stress" period. 
# Let's plot EDA over time, coloring the line by Condition.

ggplot(subject_s2, aes(x = Time_Sec, y = EDA, color = Condition)) +
  geom_line(linewidth = 0.8) +
  scale_color_manual(values = c("Baseline" = "gray50", "Stress" = "#E41A1C", "Amusement" = "#377EB8")) +
  labs(
    title = "Physiological Stress Response Over Time (Subject S2)",
    subtitle = "Tracking Electrodermal Activity (EDA) through different experimental phases",
    x = "Time (seconds)",
    y = "Skin Conductance (μS)",
    caption = "Red indicates the Stress phase, clearly marked by a surge in EDA."
  ) +
  my_theme +
  theme(legend.position = "bottom")

Temporal evolution of EDA and Temperature for Subject S2. The shaded regions represent the Stress condition.

3.6 3.6 Comparison with Office Stress (SWELL)

Finally, we briefly examine the SWELL-KW dataset to understand the feature distribution in an office setting.

Show Code
# Quick inspection of SWELL dataset features
# We select numeric columns that might represent heart rate or skin conductance
# Note: SWELL column names depend on the specific feature set used. 
# Here we visualize the first few available numeric columns as an exploratory step.

# Select only numeric columns for a quick overview
swell_numeric <- swell %>%
  select(where(is.numeric)) %>%
  pivot_longer(cols = everything(), names_to = "Feature", values_to = "Value") %>%
  # Just pick top 4 features to plot to avoid overcrowding
  filter(Feature %in% unique(Feature)[1:4])

ggplot(swell_numeric, aes(x = Value)) +
  geom_histogram(bins = 30, fill = "forestgreen", alpha = 0.7) +
  facet_wrap(~Feature, scales = "free") +
  labs(
    title = "Feature Distributions in SWELL (Office) Dataset",
    subtitle = "Exploratory histograms of physiological/behavioral features",
    x = "Value",
    y = "Count"
  ) +
  my_theme

3.7 3.7 Multidimensional Stress Separation

Can we distinguish “Stress” from “Baseline” by examining multiple physiological signals simultaneously? Here we visualize the 2D feature space plotting Temperature against EDA to reveal how stress occupies a distinct physiological region.

Show Code
# Create 2D scatter plot with density contours
ggplot(wesad_main, aes(x = Temp, y = EDA, color = Condition)) +
  geom_point(alpha = 0.25, size = 2) + 
  stat_ellipse(level = 0.95, linewidth = 1.2, linetype = "dashed") + 
  scale_color_manual(
    values = c("Baseline" = "#4DAF4A", "Stress" = "#E41A1C", "Amusement" = "#377EB8"),
    labels = c("Baseline (Calm)", "Stress (Aroused)", "Amusement (Relaxed)")
  ) +
  labs(
    title = "Physiological Feature Space: Temperature vs. EDA",
    subtitle = "Stress is characterized by elevated EDA with moderate temperature variation",
    x = "Skin Temperature (°C)",
    y = "Electrodermal Activity (μS)",
    color = "Experimental Condition"
  ) +
  my_theme +
  theme(
    legend.position = "bottom",
    legend.title = element_text(face = "bold")
  ) +
  guides(color = guide_legend(override.aes = list(alpha = 1, size = 4)))

Scatter plot of Skin Temperature vs. EDA. Stress (Red) clusters in a distinct region with elevated EDA levels.

3.7.1 Interpretation

The scatter plot reveals clear clustering patterns in the physiological feature space:

  • Stress (Red): High EDA values (>5 μS) with temperature clustered around 32-34°C
  • Baseline (Green): Lower EDA (<5 μS) and slightly cooler temperature
  • Amusement (Blue): Moderate EDA with the widest temperature distribution

The 95% confidence ellipses show minimal overlap between Stress and Baseline, suggesting these signals could effectively classify stress states in a machine learning model.


3.8 3.8 Cross-Context Comparison: Laboratory vs. Real-World

How do physiological responses differ between controlled laboratory settings and naturalistic environments? We compare heart rate patterns from the WESAD lab study with the AffectiveROAD driving dataset.

Show Code
# Prepare lab data (WESAD ECG - proxy for heart activity)
lab_data <- wesad_main %>% 
  filter(Condition == "Stress") %>%
  select(ECG) %>% 
  mutate(
    Context = "Laboratory (WESAD)",
    Signal_Type = "ECG Amplitude"
  ) %>%
  rename(Value = ECG)

# Prepare real-world data (AffectiveROAD Heart Rate)
realworld_data <- road %>% 
  select(HR) %>% 
  mutate(
    Context = "Real-World (Driving)",
    Signal_Type = "Heart Rate (BPM)"
  ) %>%
  rename(Value = HR) %>%
  filter(!is.na(Value))

# Combine datasets
comparison_data <- bind_rows(lab_data, realworld_data)

# Create side-by-side density plots
p1 <- ggplot(comparison_data, aes(x = Value, fill = Context)) +
  geom_density(alpha = 0.6, linewidth = 1) +
  scale_fill_manual(values = c("Laboratory (WESAD)" = "#FF6B6B", 
                                 "Real-World (Driving)" = "#4ECDC4")) +
  facet_wrap(~Context, scales = "free", ncol = 2) +
  labs(
    title = "Physiological Signal Distributions: Lab vs. Real-World Contexts",
    subtitle = "Comparing controlled stress induction (WESAD) with naturalistic driving stress (AffectiveROAD)",
    x = "Signal Value (context-specific units)",
    y = "Probability Density",
    caption = "Note: Different measurement modalities (ECG vs. HR) and units - focus on distribution shape"
  ) +
  my_theme +
  theme(
    legend.position = "none",
    strip.background = element_rect(fill = "gray90", color = NA),
    strip.text = element_text(face = "bold", size = 12)
  )

print(p1)

Distribution comparison: Laboratory ECG (WESAD) vs. Real-world Heart Rate (Driving)

3.8.1 Context-Specific Insights

Descriptive Statistics by Context
Statistic Lab (ECG) Driving (HR)
Mean 0.00 82.35
Std Dev 0.31 14.30
Median -0.02 80.62
IQR 0.17 14.21

Key Findings:

  • Laboratory stress (WESAD) shows more uniform distributions due to controlled experimental protocols
  • Real-world driving exhibits higher variability, reflecting diverse traffic conditions and individual driving styles
  • Both contexts demonstrate elevated physiological arousal, but with distinct signal characteristics

This comparison highlights the importance of validating stress detection models across multiple contexts - what works in the lab may require adaptation for real-world deployment.